autotailor: Add validation for variable names, rule IDs, and profile IDs #2342
Conversation
Add DataStreamValidator class that validates all IDs against the SCAP datastream before generating tailoring XML. This prevents silent failures from invalid variable names, rule IDs, group IDs, or profile IDs. Key features: - Parses datastream to extract valid profiles, values, rules, and groups - Validates IDs before use in Profile and Tailoring classes - Provides fuzzy matching suggestions for typos using difflib - Generates clear error messages with suggestions - Add --no-validate flag for performance-critical use cases Performance: - ~227ms overhead on 20MB datastream (validation enabled by default) - ~33ms with --no-validate flag (7x faster) - Validation prevents compliance drift and silent failures Fixes issue where autotailor accepted arbitrary variable names without validation, creating invalid XML that fails at evaluation time.
Add comprehensive unit tests for the new validation feature: - test_datastream_validator: Tests validator with valid and invalid IDs for profiles, values, rules, and groups - test_profile_with_validator: Tests Profile class integration with validator, ensuring invalid IDs are rejected - test_validator_suggestions: Tests fuzzy matching suggestions for typos in ID names All tests pass and verify that: - Valid IDs are accepted - Invalid IDs are rejected with clear error messages - Similar valid IDs are suggested for typos - Validation integrates properly with Profile class
| root = tree.getroot() | ||
|
|
||
| # Register namespaces | ||
| namespaces = { |
There was a problem hiding this comment.
This can be a global variable. It could be used all over the script.
| self.group_ids = set() | ||
| self._parse_datastream() | ||
|
|
||
| def _parse_datastream(self): |
There was a problem hiding this comment.
Split this method to more smaller methods.
| help="Skip validation of IDs against the datastream. This significantly speeds up " | ||
| "execution on large datastreams but may produce invalid tailoring files if incorrect " |
There was a problem hiding this comment.
| help="Skip validation of IDs against the datastream. This significantly speeds up " | |
| "execution on large datastreams but may produce invalid tailoring files if incorrect " | |
| help="Skip validation of IDs against the data stream. This significantly speeds up " | |
| "execution on large data streams but may produce invalid tailoring files if incorrect " |
| return msg | ||
|
|
||
| def validate_profile(self, profile_id): | ||
| """Validate a profile ID exists in the datastream.""" |
There was a problem hiding this comment.
It's always a "data stream", never a "datastream" or even something worse like "DataStream".
| if args.json_tailoring: | ||
| t.import_json_tailoring(args.json_tailoring) | ||
| try: | ||
| t.import_json_tailoring(args.json_tailoring) |
There was a problem hiding this comment.
Is the JSON tailoring also validated?
| help="Use local path for the benchmark href instead of absolute file:// URI. " | ||
| "Absolute paths are converted to basename, relative paths are preserved.") | ||
| parser.add_argument( | ||
| "--no-validate", action="store_true", |
There was a problem hiding this comment.
Add this option to the man page.
| if profile_id not in self.profile_ids: | ||
| raise ValueError(self._create_validation_error("Profile", profile_id, self.profile_ids)) | ||
|
|
||
| def validate_value(self, value_id): |
There was a problem hiding this comment.
It would be amazing if the tool would also verify validity of selectors used in the -V CLI option.
For example:
utils/autotailor -V xccdf_org.ssgproject.content_value_var_selinux_state=OOO /usr/share/xml/scap/ssg/content/ssg-fedora-ds.xml cis
This command should error because the XCCDF Value doesn't contain selector OOO. But currently it isn't checked and the command doesn't error and creates invalid tailoring.
- Move XML namespace dict to module-level DS_NAMESPACES constant - Split _parse_datastream into _extract_profiles/values/rules/groups - Add selector validation for -V/--var-select option - Fix terminology: "datastream" -> "data stream" in all user-facing text - Add --no-validate option to the man page
|
|
@jan-cerny I've addressed the feedback on ef1d59d |
jan-cerny
left a comment
There was a problem hiding this comment.
I have tried various combinations of parameters (selections, unselections, variables, selectors) and verified that they are validated against the data stream. I also used the --no-validate option. Tests passed both locally and in CI.



Fixes: https://redhat.atlassian.net/browse/RHEL-143568